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Abstract 

A  property  observed  in  high  reliability  fault  tolerant  control  systems  is  the  relatively 
rare  occurrence  of  component  failures  compared  to  the  frequent  occurrence  of  redun¬ 
dancy  management  decision  events.  This  property  leads  to  a  temporal  decomposition 
of  the  semi-Markov  chain  reliability  model  into  two  time  scales:  a  slow  time  scale  for 
failure  events,  a  fast  time  scale  for  FDl  events.  Conditions  are  described  under  which  a 
perturbed  semi-Markov  chain  can  be  approximated  by  an  enlarged  Markov  process,  the 
parameters  of  which  are  derived  from  the  parameters  of  the  semi-Markov  chain. 

1  Introduction 

A  typical  fault-tolerant  control  system  (FTCS)  is  composed  of  many  highly  reliable  re¬ 
dundant  components  including  sensors,  actuators,  power  supplies  and  computers.  These 
components  are  networked  in  a  hierarchical  architecture,  and  their  use  is  governed  by  a 
redundancy  management  (RM)  poLry  Failure  detection  and  isolation  (FDI)  logic  is  imple¬ 
mented  to  indicate  to  the  RM  system  which  components  are  no  longer  safely  usable. 
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It  has  been  demonstrated  [1,2]  that  the  reliability  and  availability  of  an  FTCS  can 
be  computed  using  a  finite-state  generalized  Markov  (that  is,  Markov  or  semi-Markov) 
reliability  model.  These  calculations  are  often  difficult  or  impossible  to  accomplish  by 
cleissical  combinatorial  methods  due  to  time-ordered  event  sequences  that  are  a  consequence 
of  the  RM  policy  and  FDI  logic.  If  sequential  tests  are  used  to  detect  failures  [3],  then  a 
semi-Markov  chain  reliability  model  must  be  used  to  predict  the  system  reliability. 

Many  methods  exist  for  the  simplified  analysis  of  the  steady  state  behavior  of  generalized 
Markov  chain  models.  However,  generalized  Markov  chains  model  of  FTCS  invariably 
contain  one  or  more  trapping  states  that  represent  system  loss.  Thus,  the  steady  state 
behavior  is  of  no  interest  because  the  steady  state  condition  will  certainly  be  system  loss. 
It  is  the  transient  behavior  of  these  models  that  is  of  interest. 

A  generalized  Markov  chain  is  characterized  by  a  discrete  set  of  states  and  an  arbitrary 
distribution  of  the  holding  or  sojourn  time  for  each  transition.  The  semi-Markov  chain 
specializes  to  a  Markov  chain  when  the  holding  times  are  geometrically  distributed  and 
identically  distributed  for  all  transitions  exiting  a  particular  state. 

The  result  that  must  be  routinely  computed  in  analyzing  the  reliability  model  is  the 
interval  transition  probability,  <f>i}{n),  which  is  the  probability  that  the  model  occupies  state 
j  at  time  n  given  that  it  entered  state  t  at  the  initial  time.  For  FTCS,  the  states  represent 
a  complete  characterization  of  the  condition  of  the  system.  Thus,  if  all  of  the  <f>ij{n)  that 
correspond  to  system  loss  configurations  for  j  can  be  computed  for  n  corresponding  to 
the  finite  duration  of  the  mission,  then  the  probability  of  an  unsuccessful  mission  can  be 
computed. 

Once  the  interval  transition  probabilities  have  been  determined  for  a  particular  time 
n,  the  probability  of  occupying  each  state  can  be  determined  if  the  initial  state  occupancy 
probabilities  are  known.  Let  ?L(n)  be  the  state  probability  distribution  at  time  m.  If  x(0) 


is  known,  then 


n{n)  =  x(0)#(n) 


In  the  context  of  the  FTCS,  the  first  state  is  routinely  chosen  to  represent  the  situation  where 
all  components  are  working.  Usually,  the  system  occupies  the  first  state  with  probability 
one  at  the  initial  time. 


The  interval  transition  probabilities  are  generated  by  the  semi-Markov  chain  recursion 
formula  [4]: 

n 

4^(„)  =  >  W(n)  +  X]  G(n)«(n  -  m);  IC  :  $(0)  =  I  (2) 

m=0 

Taking  7-transforms  of  both  sides  of  (2)  and  solving  for  ^(7): 

4>(7)  =  [I-G(7)]-i>W(7)  (3) 

The  z-transform  of  the  state  occupancy  probability  distribution  is 

i(7)  =  5(0)[I-G(7)]-'>W(7)  (4) 

which  follows  directly  from  (2).  The  inverse  matrix  of  [I  -  G(z)]  always  exists  for  a  semi- 
Markov  chain.  The  inverse  transform  of  either  2.(7)  or  ^(7)  can  be  found  using  standard 
partial  fraction  expansion  techniques.  However,  for  all  but  the  simplest  of  situations,  trans¬ 
form  methods  are  useless  in  a  practical  sense. 

In  practice,  the  interval  transition  probability  matrix  is  nearly  always  found  by  per¬ 
forming  the  semi-Markov  recursion  numerically.  For  a  model  with  N  states,  computation 
of  ^^(n)  requires  storage  of  2nN^  values  because  both  ^^(n)  and  G(n)  must  be  stored  for 
all  times  prior  to  and  including  time  n.  A  reliability  model  for  a  typical  inertial  navigation 
system  might  have  twenty  states,  a  sampling  period  of  200m8,  and  a  two  hour  mission  time. 
This  would  require  storage  of  2.88  x  10^  single  precision  values  and  require  230  megabytes  of 
storage.  Moreover,  the  number  of  floating  point  multiplications  required  to  compute  2(n) 
from  2(0)  is  about  -  which  is  2.59  x  10^*  for  the  example  described  above.  Thus,  the 

computational  burden  and  memory  requirements  are  tremendous  even  for  a  simple  system. 

The  problem  to  be  addressed  in  this  paper  is  to  substantially  reduce  the  computational 
burden  while  preserving  the  accuracy  of  reliability  and  availability  calculations. 

One  possible  means  for  doing  this  is  direct  Monte  Carlo  techniques.  If  a  sufficient 
number  of  Monte  Carlo  simulations  are  made  of  system  operations  to  account  correctly  for 
all  possible  random  events  that  bear  on  the  reliability  calculation,  then  any  aspect  of  system 
performance  can  be  evaluated.  To  obtain  meaningful  results  for  high  reliability  systems 
with  events  that  occur  with  probabilities  as  low  eis  1  x  10~®  (typical  of  the  probability  of  a 
component  failure  over  a  single  time  step),  over  one  billion  simulations  must  be  performed. 
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This  task  is  as  formidable  as  evaluating  the  semi-Markov  chain  recursion  for  large  values  of 
the  time  index.  Consequently,  reliability  calculations  via  direct  Monte  Carlo  methods  also 
have  prohibitive  computational  costs. 

Lewis  suggested  in  [5,6]  that  a  modifled  Monte  Carlo  approach  be  used  for  high  reliability 
systems.  Again,  failure  events  are  assumed  to  be  extremely  rare  relative  to  other  events 
that  occur  in  the  system.  Thus,  the  vast  majority  of  simulations  will  be  those  for  which 
no  failures  occur.  Lewis  assumes  that  ail  events  have  exponentially  distributed  times  of 
occurrence  and  can  be  modeled  by  a  Markov  chain.  It  is  possible  to  sample  the  failure 
distributions  before  a  simulation  is  initiated  to  determine  if  any  failures  will  occur  during 
the  mission.  If  all  failures  occur  after  the  mission  has  been  completed  (which  is  usually 
the  case),  then  a  normal  simulation  results.  If  a  failure  occurs  during  the  mission,  then 
the  complete  simulation  must  be  performed  including  FDI  decisions,  decision  errors,  and 
repairs.  However,  this  approach  does  not  apply  to  semi-Markov  chains  because  FDI  events 
arising  from  a  sequential  FDI  test  are  not  exponentially  distributed.  In  these  cases,  a 
complete  simulation  must  always  be  run  and  no  benefits  are  derived  from  the  modified 
technique. 

Another  approach  that  exploits  the  rare  occurrence  of  failure  events  is  suggested  by 
Trivedi  in  [7,8].  The  model  is  based  upon  a  time-scale  decomposition  of  the  system  into  vir¬ 
tually  disjoint  fault-occurrence  and  fault  handling  submodels.  The  fault-handling  submod¬ 
els  represent  aggregated  states  and  the  failure  occurrence  submodels  dictate  the  behavior 
between  these  aggregated  states.  The  reliability  of  the  system  predicted  by  the  aggregated 
model  is  then  computed  using  Markov  or  Monte  Carlo  techniques.  However,  the  only  fault¬ 
handling  events  that  are  accounted  for  are  detections  and  missed  detections  following  actual 
faults.  A  common  FDI  event  that  cannot  be  treated  by  these  hybrid  models  is  the  false 
alarm,  which  occurs  in  the  absence  of  a  fault.  Therefore,  this  approach  is  limited  to  systems 
where  false  alarms  cannot  occur. 

In  this  paper,  the  relatively  rare  occurrence  of  component  failures  relative  to  RM  decision 
events  will  be  exploited  in  the  development  of  an  approximate  method  for  evaluating  semi- 
Markov  chain  reliability  models  of  fault  tolerant  control  systems. 


2  A  Limit  Theorem  for  Semi-Markov  Chains 


Theorem  1  describes  how  a  perturbed  semi-Markov  chain,  which  is  dependent  on  a  small 
parameter  f  in  a  certain  way,  can  be  described  asymptotically  by  an  enlarged  Markov  process 
as  e  — ►  0.  This  theorem  is  an  extension  of  the  results  for  discrete  parameter  semi-Markov 
processes  stated  in  [9]. 

The  semi-Markov  chain  depends  on  a  small  parameter  €  such  that  the  entire  state 
space  of  the  semi-Markov  chain  can  be  decomposed  into  disjoint  classes  of  states  where  the 
probabilities  of  departure  from  each  class  tend  to  zero  with  e.  Also,  the  total  sojourn  in  each 
class  is  assumed  to  have  a  non-degenerate  distribution  in  the  limit  as  £  -+  0.  (When  e  =  0, 
the  chain  will  be  referred  to  as  the  unperturbed  semi-Markov  chain  while  the  e-dependent 
chain  will  be  referred  to  as  the  perturbed  semi-Markov  chain.) 


Theorem  1  (Limit  Theorem  for  Semi-Markov  Chains)  Let  the  set  E  of  states  of  the 
semi-Markov  chain  be  expressible  as  a  union  of  disjoint  classes: 

TV* 

E=Y,Ek  (5) 

fc=i 

Let  be  the  sojourn  of  the  semi-Markov  chain  in  class  Ek  when  it  starts  from  state 
i  €  Ek  and  moves  to  class  Er  where  r  ^  k.  If  the  following  two  conditions  hold  for  the 
semi-Markov  chain  E: 

1.  The  elements  of  the  core  matrix  sequence  (g‘y(n)  |  i,j  €  E}  specifying  the  semi- 
Markov  chain  depend  as  follows  on  the  small  parameter  e; 


2.  The  embedded  Markov  chains  defined  by  the  matrices  |p,-y  |  i,j  G  k  E  Afj 

are  er- 

godic  xvith  stationary  distributions 

|^6^fcVA:eM}. 

Then: 

^  0  =  7tr  1 1  exp  J  1 

(9) 

where: 

7*r  = 

E-ea  'I**?!*'’ 

E^CEE. 

(10) 

A*  = 

(11) 

Here: 

,i‘'> 

III 

M 

(12) 

jeEr 

»!*> 

-  E«,*’ 

(13) 

i^Ek 

<■!*> 

(14) 

3€Eit 

fij 

oo 

n=0 

(15) 

PROOF;  Let  ef,y  denote  the  integer  valued  sojourn  of  the  semi-Markov  chain  in  state 
«  with  next  transition  to  state  j  with  the  holding  time  distribution  -hij^n/e)  while  the 
are  the  transition  indicators  from  state  (  to  state  j.  The  probability  distribution  of  the 
random  quantities  can  be  expressed  in  terms  of  total  probability  as 

Pr  {rj;.’  <  n}  =  +  Z]  {^0  =  ^  ”}  (1®) 

JGEh  j€Er 

DeRning  the  interval  transition  CDF  as 

=  (17) 

then 

-rir  (")  =  9ij{fn)  -^)+  J2 

j€Eit  m=0  j&Er 

Taking  z-transforms  of  both  sides  yields: 


(18) 


The  z-transforms  of  the  must  be  evaluated  to  first  order  in  f.  From  (6)  and  the 

definition  of  the  z-transform  [10]: 


4W=p:if;*«(7)  *■" 

n=0  '  *  ^ 


Note  that  p‘y  has  been  moved  in  front  of  the  summation  sign  because  it  does  not  depend 
on  time.  Let  m  =  n/e  and  expand  z"*"*  in  a  Taylor  series  about  e=0.  Then: 

OO 

9ij{z)  =  Pij  X]  {1  -  *08  ("»)  +  0{e)  (21) 

m=0 

where  0{€)  represents  terms  such  that  in  the  limit  as  f  — >  0,  the  quantity  0(e)/e  approaches 
zero.  Noting  that: 

£/».;(«)  =  1  (22) 

n=0 

^nhij{n)  =  f.y.  (23) 

n=0 

and  substituting  pL  from  (7)  and  combining  terms  of  0(e)  yields: 


9hi^)  = 


6,|;>  +  0(e) 


if  I  6  Ek  and  j  ^  Ek 


Incorporating  these  results  into  (19)  and  placing  all  terms  proportional  to  e  on  the  RHS: 

+  <{74t}E?1,‘’+0(<)  (25) 

i&Er 

Now,  passing  to  the  limit  as  «  — ♦  0,  the  RHS  vanishes  and  the  -<j>kr(,i)  are  found  to  satisfy 
the  system  of  equations  below: 

-  E  p1,*'5'>1'’(")  =  “  (25) 

Let  P*  =  [^p|**j  represent  the  embedded  Markov  chain  operator  in  class  E^  of  the  unper¬ 
turbed  semi-Markov  chain  E.  'I'hc  system  of  equations  in  (26)  can  be  expressed  as: 


After  successive  premultiplication  by  Pfc,  and  taking  the  limit  as  n  — ♦  oo: 


(28) 


Under  Condition  2,  the  ergodic  theorem  for  Markov  chains  [11]  implies  that: 

limP2  =  Pr  = 

fl— *00 

so  that  the  solution  to  (27)  is  independent  of  the  superscript: 

-rirC'*)  =  -4>kr{z)  WiGEk,^keM  (30) 

Now,  (25)  is  of  the  form  f{x)  —  g{x,t),  that  is,  the  LHS  is  not  a  function  of  e  and 
is  therefore  constant  with  respect  to  e.  However,  as  e  — >0,  the  RHS  approaches  zero  so 
that  the  LHS  must  be  zero  for  all  values  of  e.  Canceling  e  from  the  result,  multiplying 
by  the  stationary  probabilities  of  the  unperturbed  semi-Markov  chain  in  class  k,  and 
summing  over  i  e  Ei,  yields: 


r{*) 


(29) 


E 

ieEk 


E 


■T}E”l‘’E»i?’+ow  (31) 


i€Ek 


}^Er 


On  passing  again  to  the  limit  as  e  — *0,  noting  that  all  of  the  have  the  limit 

function  -<f>kr{z),  and  solving  for  -<f>ifr{z),  the  z-transform  of  the  class- to-class  transition 
PMF  becomes: 

4>kr(^)  =  Ikr  ^k\ - —7-  (32) 

logz4-Afc 

The  mapping  from  the  z  domain  to  the  s  domain  (Laplace)  is  given  by  s  =  (logz)/r.  Divid¬ 
ing  top  and  bottom  by  the  sampling  period  T,  and  applying  the  transformation  concludes 
the  proof.  □ 

In  summary,  Theorem  1  describes  the  conditions  under  which  a  perturbed  semi-Markov 
chain  can  be  approximated  by  an  enlarged  Markov  process  that  evolves  in  the  slow  time- 
scale,  and  also  states  how  the  parameters  of  the  Markov  process  are  determined  from  the 
parameters  of  the  semi-Markov  chain.  In  the  context  of  FTCS,  the  fast  time  scale  behavior 
within  a  class  would  represent  FDI  decision  and  RM  events  while  the  slower  class-to-class 
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behavior  would  represent  the  occurrence  of  failures.  The  class-to-class  interval  transition 
CDF  $tr(0  that  results  is  a  continuous  time  envelope  of  the  behavior  between  the  classes. 
This  interpretation  is  intuitively  satisfying  since  failures  are  invariably  assumed  to  have 
exponentially  distributed  times  of  occurrence  over  continuous  time. 

However,  two  problems  occur  in  the  application  of  Theorem  1  to  FTCS  models:  (1)  the 
embedded  Markov  chains  for  each  class  of  the  unperturbed  model  are  rarely  ergodic,  and 
(2)  the  holding  time  PMFs  are  usually  functions  of  n,  not  n/e,  that  is,  the  holding  times  are 
typically  not  on  the  order  of  the  mean  time  to  a  component  failure.  The  requirement  that  the 
embedded  Markov  chains  of  the  unperturbed  classes  be  ergodic  is  important  in  producing 
(26)  and  guarantees  the  existence  of  the  stationary  probabilities  |  i  G  k  G  Af  J. 

The  ergodicity  condition  can  be  relaxed  in  much  the  same  way  as  was  done  in  [12]  for 


semi-Markov  processes.  This  will  be  accomplished  in  Lemma  2  and  Lemma  3.  The  second 
problem  can  be  mitigated  by  introducing  time-scaling  into  Theorem  1,  as  will  be  done  in 
Theorem  4. 

3  Relaxation  of  the  Ergodicity  Condition 

Lemma  2  discusses  how  the  existence  of  the  Caesaro  limit  of  the  embedded  Markov  chain 
operator  leads  to  a  relaxation  of  the  ergodicity  condition. 

Lemma  2  Consider  a  semi-Markov  chain  state  space  E  that  can  be  expressed  as  a  sum  of 
disjoint  classes  according  to  (5)  and  (7).  Let  P*  =  ^ke  embedded  Markov 

chain  operator  for  class  E^-  The  solution  of  (26)  is  independent  of  the  superscript  (and  the 
results  of  Theorem  1  hold),  if  the  Caesaro  limit  exists: 


1  " 

lim-yPit  =  nk=  : 

n-*oo  fx  * 


PROOF:  The  system  of  equations  in  (26)  can  be  expressed  in  matrix  form  as  is  done 
in  (27).  Successively  premultiplying  both  sides  by  Pjt,  and  averaging  an  infinite  number  of 


these  terms: 


-  lim  -  V 


Because  the  operator  Pk  satisfies  the  Caesaro  limit  from  (33),  the  solution  of  (26)  is  inde¬ 
pendent  of  the  superscript.  □ 

The  relaxation  due  to  Lemma  2  demonstrates  that  the  ergodicity  condition  of  Theorem 
1  was  sufficient,  but  not  necessary.  Thus,  the  conditions  under  which  the  Caesaro  limit 
exists  should  be  determined  in  hopes  of  finding  a  necessary  condition. 

Lemma  3  Consider  a  semi-Markov  chain  state  space  E  that  can  be  expressed  as  a  sum  of 
disjoint  classes  according  to  (5)  and  (7).  LetPk  =  [p<y^]  ^^pf^sent  the  embedded  Markov 
chain  operator  of  the  unperturbed  chain  for  class  Ek-  If  the  embedded  Markov  chain  rep¬ 
resented  by  the  operator  P*  is:  1)  ergodic,  or  2)  non-ergodic  with  one  and  only  one  unit 
eigenvalue,  then  the  Caesaro  limit  in  (Sf)  exists. 

Proof:  The  proof  of  this  lemma  is  essentially  similar  to  that  in  [12].  For  details  of  this 
proof,  see  [13].  □ 

4  Limit  Theorem  with  Time  Scaling 

In  FTCS  with  small  single  step  component  failure  probabilities,  the  holding  time  PMFs 
associated  with  the  core  matrix  sequence  elements  do  not  depend  on  e  but  only  on  the 
FDI  decision  delay.  If  a  semi-Markov  chain  is  observed  in  another  time  scale  that  is  \/8 
times  that  of  the  original  time  scale,  then  the  PMF  hij[n)  will  be  affected  but  the  eventual 
transition  probabilities,  p‘y,  will  remain  the  same  because  they  characterize  the  transition 
probability  from  state  i  to  state  j  regardless  of  when  the  transition  takes  place.  However, 
the  holding  time  PMFs  in  the  new  time  scale  are  not  obtained  by  simply  changing  the 
argument  of  h,>(-)  from  n  to  n/5.  This  is  because  the  summation  of  hij{n/8)  for  all  non¬ 
negative  values  of  the  time  index  would  not  be  unity  and  so  would  not  yield  a  proper  holding 
time  function.  The  CDF  -/i,y(n)  associated  with  the  PMF  h,;(n)  must  be  determined  and 
the  argument  of  the  CDF  replaced  by  n/8.  The  new  PMF  h.'-^[n)  observed  in  the  new 
time  scale  would  have  most  of  its  probability  mass  close  to  the  origin.  The  statistics  of  the 
process  in  the  new  time  scale  will  depend  on  the  small  parameter  8  -  the  time  scaling  factor. 

Theorem  4  (Limit  Theorem  With  Time  Scaling)  Let  the  set  E  of  states  of  the  semt- 
Markov  chain  be  expressible  as  a  sum  of  disjoint  classes  as  in  (5).  Let  be  the  sojourn 
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of  the  semi-Markov  chain  in  class  £*  when  it  starts  from  state  i  G  Ek  and  moves  to  class 
Er  for  r  k.  If  the  following  two  conditions  hold  for  the  semi-Markov  chain  E: 

1.  The  elements  of  the  core  matrix  sequence  {p,-y(n)  |  i,j  G  E}  specifying  the  semi- 
Markov  chain  depend  as  follows  on  the  small  parameters  S  and  e; 

(35) 

Here,  is  the  transition  CDF  of  the  semi-  Markov  chain  in  the  original  time  scale  and 

-hij(0)  =  0.  The  pjy  can  be  expanded  in  a  Taylor  series  about  e=0  as  in  (7).  The  embedded 
Markov  chain  obeys  the  usual  Markov  chain  properties  described  in  (8). 

2.  The  embedded  Markov  chains  defined  by  the  matrices  1  i,j  S  Ek'^k  G  are 

ergodic  or  non-ergodic  with  one  and  only  one  unit  eigenvalue  with  the  stationary  probabilities 
(in  the  Caesaro  limit  sense)  |  i  €  Ekik  G  m|. 

Then: 

=  Tftr  |l  -  exp  I  (36) 

where  the  parameters  of  the  enlarged  Markov  process  were  defined  in  Theorem  1  and  a  =  S/e, 

PROOF;  The  proof  of  this  theorem  is  essentially  identical  to  that  of  Theorem  1.  For 
details  of  this  proof,  see  [13].  □ 

It  should  be  noted  that  an  explicit  analytical  expression  of  the  core  matrix  sequence, 
G‘(n),  is  not  required  to  expand  the  eventual  transition  probabilities  of  the  perturbed 
semi-Markov  chain,  in  a  Taylor  series  about  £=:0.  The  eventual  transition  probabilities 
may  be  evaluated  numerically,  which  is  what  would  be  done  in  practice.  This  is  fortunate 
because  the  direct  form  of  the  core  matrix  is  not  always  available  (3|.  In  many  cases,  the 
decision  time  PMFs  are  tabulated  numerically  and  no  functional  form  is  available. 

Also,  the  time  scale  decomposition  of  the  semi-Markov  chain  is  crucial  to  the  use  of  this 
technique.  A  simple  way  of  characterizing  each  class  is  as  follows:  the  first  class  contains 
states  for  which  no  failures  have  occurred,  the  second  class  contains  states  for  which  a  single 
failure  has  occurred,  the  third  cIms  contains  states  for  which  two  failures  have  occurred,  etc. 
These  classes  arise  by  setting  £=0  and  observing  which  groups  of  states  of  the  unperturbed 
semi-Markov  chain  do  not  communicate. 


Finally,  estimates  of  the  original  semi-Markov  chain  state  probabilities  can  be  recovered 
from  the  enlarged  Markov  process.  The  asymptotic  behavior  of  the  unperturbed  semi- 
Markov  chains  in  each  class  are  the  stationary  probabilities  (or  Caesaro  limit  probabilities] 
for  that  class.  The  class-to-class  behavior  is  determined  by  the  enlarged  process.  The 
approximate  state  probabilities  in  e^u:h  class  are: 

xj*)(n)  =  x.*n(n)  (37) 

where  the  approximate  class  probabilities  of  the  enlarged  process  are  found  from  its  interval 
transition  probability  matrix. 

5  Performance  Evaluation  of  the  SCMS 

Two  simple  semi-Markov  reliability  models  of  a  single  component  monitoring  system  (SCMS) 
will  be  developed.  The  SCMS  uses  a  sequential  FDI  test  to  monitor  the  status  (failed  or 
working)  of  a  single  component.  The  two  models  will  differ  in  monitoring  policy.  The  first 
example,  SCMS-I,  models  an  FDI  test  that  operates  continuously  over  the  entire  mission 
duration.  The  second  example,  SCMS-II,  models  an  FDI  test  that  is  discontinued  after  the 
first  failure  indication  (namely,  abbreviated  monitoring). 

In  this  section,  the  performance  of  the  SCMS  will  be  evaluated  through  application  of 
the  approximate  method  to  a  semi-Markov  model.  The  procedure  follows:  (1)  semi-Markov 
transition  diagrams  are  constructed  describing  all  of  the  random  events  that  can  take  place, 
(2)  the  direct  form  of  the  core  matrix  sequence  is  derived,  (3)  the  core  matrix  is  placed  in 
standard  form,  (4)  the  performance  is  evaluated  through  application  of  Theorem  4. 

In  addition,  z-transforms  will  be  used  to  determine  an  analytical  expression  for  the 
state  and  class  occupancy  probabilities,  x(n)  and  2r'(n)  respectively.  The  results  of  the  z- 
transform  analysis  will  be  used  to  evaluate  the  accuracy  of  the  approximate  method.  This 
is  possible  here  because  the  models  are  relatively  simple.  In  more  general  cases,  this  would 
not  be  practical. 
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Table  1:  State  definitions  and  class  decompositions  for  SCMS-I.II 
State  State  Definition  Claes 

1  Component  is  working  1 

2  Component  has  a  false  alarm  1 

3  System  loss  -  component  failed  2 


CLASS  1 
CLATST 


Figure  1:  Semi-Markov  transition  diagram  for  SCMS-I 

5.1  SCMS  with  continuous  monitoring 

Table  1  enumerates  and  defines  the  states  of  a  semi-Markov  chain  reliability  model  of  the 
SCMS-I.  The  dashed  line  in  the  table  distinguishes  the  class  decomposition  of  the  model; 
class  1  contains  states  1  and  2,  class  2  contains  only  state  3. 

The  semi-Markov  transition  diagram  for  the  SCMS-I  is  presented  in  Figure  1.  Two 
aspects  of  this  diagram  should  be  noted.  Given  that  the  chain  has  entered  a  state,  the  lines 
directed  out  of  that  state  represent  transitions  after  the  chain  has  remained  in  that  state  for 
a  period  of  time,  namely,  the  holding  time.  Secondly,  the  dashed  lines  represent  transitions 
whose  transition  PMFs  are  proportional  to  e.  Thus,  a  dashed  line  represents  the  condition 
that  no  such  transition  occurs  when  <  =  0.  This  is  a  convenient  way  of  depicting  the  class 
decomposition  of  a  semi-Markov  chain  reliability  model. 
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A  complete  statistical  description  of  the  sequential  test  used  in  the  FDI  process  requires 
knowledge  of  the  conditional  PMFs  of  the  time  to  decision  of  the  test.  The  following  two 
functions  are  required: 

/^(n)  PMF  of  time  to  a  decision  that  no  failure  is  present  when  no  failure  is  present. 

/^(n)  PMF  of  time  to  a  failure  indication  when  no  failure  is  present  (false  alarm). 

In  these  PMFs,  the  fault  monitoring  event  at  time  n  must  be  conditioned  on  the  failure 
events  that  take  place  prior  to  and  including  time  n  -  1.  Thus,  it  is  assumed  that  there  is 
a  delay  of  at  least  a  single  time  step  between  when  a  failure  takes  place  and  when  it  can  be 
detected. 

Another  necessary  function  is  the  sum  of  all  probabilities  of  all  possible  test  outcomes 
-  nominal  decision,  failure  indication,  and  decision  not  yet  available  -  at  a  given  time  n. 
Itcan  be  specified  in  terms  of  the  decision  time  PMFs  as: 


n-l 


Qo(n)  =  l-^{/^(*)  +  /B(A:)};  n>l 


(38) 


*=i 


G‘(n)  = 


(39) 


Note  that  Qoin)  is  defined  only  for  positive  values  of  the  time  index  n  and  is  defined  to  be 
zero  for  n  =  0.  Thus,  one  of  the  necessary  criteria  for  a  permissible  holding  time  PMF  is 
maintained  •  there  is  no  probability  mass  at  the  initial  time. 

The  core  matrix  sequence,  G‘(n),  for  SCMS-I  can  be  expressed  in  matrix  form  as: 

(l-c)"/^(n)  (l-«)”/BW  e(l  -  €)"-'Qo(n) 

(l-e)"/B(n)  e(l  -  e)"-iQo(n) 

0  0  5(n-l) 

Any  reasonable  PMF  may  be  used  for  the  decision  time  PMFs.  However,  a  closed  form 
solution  for  2l(”)  is  desired.  A  simple  but  realistic  choice  for  the  decision  time  PMFs  is  the 
hypergeometric  PMF  [13].  This  PMF  is  a  good  approximation  to  the  holding  time  behavior 
of  many  sequential  tests,  as  demonstrated  by  Table  6.6  of  (3).  Choosing  an  appropriate 
eventual  transition  probability  yields  the  hypergeometric  decision  time  PMFs  below: 


/»  =  (a"  -  6")  ;  =  (1  -  F;,) 


/°(n)  =  A2,(a"-6") 


Al  =  F 


/o 


(l-c)(l-<i) 


(c-d) 


(40) 

(41) 
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where  0  <  6  <  o  <  1  and  0  <  d  <  e  <  1.  The  parameter  P/a  is  the  eventual  false  alarm 
probability  of  the  sequential  test.  The  core  matrix  can  now  be  expressed  in  terms  of  these 
PMFs. 

A  z-transform  analysis  of  the  semi-Markov  recursion  formula  using  the  above  core  matrix 


sequence  yields  the  state  occupancy  probability  vector 

xi(n)  =  +  (42) 

X2(n)  =  +  (43) 

s'3(n)  =  1-/2”  (44) 

and  the  class  occupancy  probability  vector  !*(«): 

K‘(n)  =  [(1  -  e)M  -  (1  -  erj  (45) 


Availability  of  these  analytical  results  permits  comparisons  to  be  made  with  the  approximate 
results  that  exploit  the  class  decomposition  to  be  described  below.  It  should  be  emphasized 
again  that  the  existence  of  analytical  is  rare,  and  occurs  only  because  the  system  is  very 
simple. 

In  order  to  derive  the  enlarged  Markov  process  for  this  model,  G'(n)  must  be  placed 
in  standard  form.  For  an  in-class  transition,  the  decomposition  is  obtained  from  the  first 
two  terms  of  the  Taylor  series  expansion  of  the  eventual  transition  probability  about  £  =  0. 
In  addition,  the  mean  waiting  times,  fij,  must  be  derived.  For  an  out-  of-class  transition, 
the  decomposition  is  obtained  by  dividing  the  eventual  transition  probability  by  e  and  then 
taking  the  zeroth  order  term  in  the  Taylor  series  expansion  about  e  =  0. 

Consider  an  in-class  transition  from  state  1  to  state  1.  First,  the  eventual  transition 
probability  is  found: 

,  (  aR  bR  ]  .  . 

Pu  -  >^0  1(1  _ap)  -  (i-6P)| 

The  decomposition  for  the  transition  is: 
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To  satisfy  the  requirements  for  a  permissible  holding  time  function,  the  holding  time  func¬ 
tion  for  this  transition  must  be  expressed  as; 


-  (‘"I") 

From  (15),  the  mean  holding  time  can  be  found: 

(1  -  abR^) 


ru  = 


(48) 


(49) 


{l-aR){l-bR) 

Thus,  all  of  the  parameters  required  to  place  this  in-class  transition  PMF  in  standard  form 
have  been  derived. 

A  second  type  of  core  matrix  element  that  must  be  platced  in  standard  form  is  one 
corresponding  to  an  out-of-class  transition  such  ^ls  a  transition  from  state  1  to  state  3. 
First,  the  eventual  transition  probability  must  be  found. 


1  jl-abR)  {l-cdR) 

Psi  «(1  ^/<‘^(i_afl)(l-6i?)  •^“(l-ci?)(l-di?) 


(50) 


The  sole  parameter  required  for  the  approximation  technique  from  this  eventual  transition 
probability  is  found  from: 

(1  - 


931  = 


1  « 
“Psi 


€=0 


-  n  -  P  )  (1-afe)  IP 

^  ^'‘^(l-a)(l-6)'^  ^“(l-c)(l-d) 


(51) 


The  eventual  transition  probabilities  of  each  row  of  G*(n)  sum  to  unity.  Thus,  this  is  a 
proper  semi-Markov  chain  [4]. 

The  next  step  in  the  procedure  is  to  determine  the  eventual  transition  probability  matrix 
of  the  unperturbed  semi-Markov  chain.  This  is  found  by  setting  e  =  0  and  ignoring  all  time 
varying  terms  in  the  core  matrix: 


l-Pfa  Pfa  0 

1-P/o  Pfa  0 

0  0  1 


(52) 


By  raising  P  to  successively  higher  powers,  the  stationary  interval  transition  probability 
matrix  is  found  to  be  identical  to  (52).  The  embedded  stationary  probability  distribution 
in  partitioned  form  is  thus: 

=  Pfa  Pfa\  1]  (53) 
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With  knowledge  of  this  and  of  the  mean  holding  times  for  transitions  from  state  t  to  j, 
fij,  it  is  possible  to  determine  the  stationary  probability  distribution  of  the  unperturbed 
semi-Markov  chain,  3r^*K  This  probability  distribution  is  needed  to  approximate  the  state 


probability  distribution  of  the  original  perturbed  semi-Markov  chain. 


From  semi-Markov  theory,  the  stationary  probability  distribution  for  each  unperturbed 


class  Ek  is  given  by 


~^Mji  n 


where  is  the  mean  waiting  time  of  the  chain  in  class  jE*; 


•e£k 


was  determined  above,  and  is  the  mean  holding  time  in  state  i: 


f(*)  =  T 

t) 


where  is  determined  from  the  limit  of  f,y,  defined  in  (15),  as  <  — ►  0. 

The  stationary  probability  distribution  of  the  unperturbed  semi-Markov  chain  will  now 


be  determined.  The  mean  holding  times  of  the  unperturbed  semi-Markov  chain  in  the  first 


class  are: 


-(1)  Ji)  (l-oh) 

-  (1  -  a)(l  -  b) 


^(1)  _  ~ 


-'22  - 

The  mean  holding  time  in  class  1  starting  from  state  i  is  thus 


Similarly,  The  mean  waiting  time  of  the  semi-Markov  chain  in  class  1  is: 


Hence,  for  this  situation  (but  not  in  general):  k  =  Km- 

The  time  scale  factor  6  is  set  equal  to  €  for  convenience.  It  should  be  noted  that  6  must 
be  of  the  same  order  as  e,  but  not  necessarily  equal. 


All  parameters  required  to  describe  the  enlarged  Markov  process  have  now  been  stated. 
The  parameters  of  the  approximate  class-to-class  interval  transition  CDF  can  be  found  eis 
described  in  Theorem  4:  721  =  1,  Ai  =  So,  the  class-to-class  interval  transition 

CDF  expressed  in  the  slow  time  scale  is: 

^^12(1')  =  l-exp|-^|  (61) 

To  return  to  the  original  time  scale,  let  V  =  6t,  and  recall  that  6  was  chosen  to  be  equal 
to  e  in  this  case.  The  rows  of  the  interval  transition  probability  matrix  of  the  enlarged 
process  must  sum  to  unity.  Since  the  semi-Markov  chain  is  always  in  state  1  at  the  initial 
time,  the  enlarged  process  is  always  in  class  1  at  the  initial  time.  Hence,  approximate  class 
occupancy  probabilities  can  be  stated  directly  from  the  Grst  row  of  the  interval  transition 
probability  matrix  since  a*(t)  =  Jr'(0)-4(t): 

i*(t)  =  [exp  I  l-exp|-^J  (62) 

By  expanding  the  approximate  Markov  process  in  terms  of  the  stationary  probabilities 
of  the  unperturbed  semi-Markov  chain  as  in  (48),  approximate  expressions  for  the  state 
occupancy  probabilities  of  the  original  process  can  be  stated  as  follows: 

i*(t)=  (l-FV<.)exp|-^|p/aexp|-^|  l-exp|-^|j  (63) 

The  approximate  expressions  above  will  be  compared  to  the  analytical  expressions  de¬ 
rived  using  z-transform  techniques. 

5.2  Discussion  of  Results  for  SCMS-I 

This  section  examines  sources  of  error  associated  with  the  approximate  technique  for  a 
specific  set  of  system  parameters:  a=0.95,  6=0.94,  c=0.89,  d=0.88  and  Pya==0-05.  This  set 
of  parameters  implies  a  time  to  detection  in  the  absence  of  a  failure  of  16  time  steps  (3.2 
seconds),  and  a  time  to  a  nominal  decision  in  the  absence  of  a  failure  of  36  time  steps  (7.2 
seconds)  for  a  sample  period  of  200  milliseconds. 

The  relative  error  (in  percent).  A,  =|  x,(n)  -  x,(n)  |  /7r,(n)  will  be  used  to  compare  the 
approximate  and  the  analytical  state  occupancy  probabilities. 


The  approximate  state  probability  time  histories,  i(n) ,  are  compared  to  those  obtained 
analytically,  *;(n),  in  Figure  2  for  each  of  the  three  states.  These  results  are  for  e=0.00005, 
implying  an  MT BF  of  20,000  time  steps  (4000  seconds  or  just  over  an  hour).  In  this  figure, 
the  state  probabilities  are  propagated  for  a  period  of  one  component  MTBF.  Time  is 
normalized  by  the  MTBF. 

The  largest  error  occurs  early,  especially  in  the  first  class.  This  is  due  to  the  fact  that 
the  normalized  state  probabilities  in  class  1  have  not  converged  to  the  class  1  stationary 
probabilities  of  the  unpe'-turbed  semi-Markov  chain.  For  example,  at  the  tenth  time  step 
the  normalized  probabilities  in  class  1  are 

£5J)(10)  =  [0.9817,  0.0183].  (64) 

These  differ  substantially  from  the  class  1  stationary  probabilities  of  the  unperturbed  semi- 
Markov  chain: 

=  [0.9500,  0.0500] .  (65) 

The  approximate  method  accurately  estimates  the  state  probabilities  when  the  nor¬ 
malized  probabilities  have  converged  to  the  stationary  probabilities  in  each  class.  This 
occurs  as  early  as  time  step  200,  and  the  relative  errors  for  states  1  and  2  have  dropped  to 
Ai  =  Aj  =  8.62  X  10"^%,  which  indicates  that  the  estimate  is  closely  tracking  the  exact 
solution.  Until  time  step  200,  use  of  the  approximate  method  is  not  valid  resulting  in  large 
relative  errors  in  the  state  probabilities. 

Another  source  of  error  is  due  to  non-zero  value  of  e  since  Theorem  4  describes  -^(t) 
in  the  limit  as  f  — »  0.  Obviously,  the  f  chosen  in  Figure  2  was  “small  enough”  because  the 
state  probabilities  were  estimated  adequately.  Figure  3  examines  the  class  2  (or  state  3) 
probability  at  100%,  50%  and  25%  of  an  MTBF  for  a  range  of  values  of  e.  The  relative 
error  decreases  markedly  with  decreasing  e  for  all  three  choices  of  mission  time.  For  large 
f,  (e  >  .01),  the  “slow”  time  scale  represented  by  failure  events  and  the  “fast”  time  scale 
represented  by  fault  monitoring  events  are  nearly  indistinguishable  from  each  other  resulting 
in  poor  estimates  of  the  state  probabilities.  In  contrast,  for  small  e,  (e  <  .001)  the  two  time 
scales  are  distinct.  For  €=0.00005,  the  time  to  a  decision  is  about  36  seconds  and  the 
MTBF  is  4000  seconds,  or,  the  "slow”  time  scale  is  approximately  100  times  slower  than 
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Figure  3:  Sensitivity  to  e  for  SCMS-I.  The  relative  error  is  plotted  versus  the  single-step 
probability,  e,  for  mission  times  of  one  MTBF,  0.5  x  MTBF,  and  0.25  x  MTBF. 

the  fast  time  scale.  Therefore,  to  obtain  accurate  estimates  of  the  state  probabilities,  it 
is  imperative  that  the  fast  and  slow  time  scales  be  distinctly  separated  in  terms  of  their 
mean  holding  times.  A  possible  rule  of  thumb  is  suggested  by  these  results  for  determining 
whether  the  time  scales  are  distinct.  That  is,  compute  the  holding  time  of  the  slowest  FDI 
event.  For  the  approximation  to  be  valid,  the  MT BF  of  the  fastest  failure  should  be  at 
least  100  times  longer  than  this  calculated  FDI  holding  time. 

The  analytical  and  approximate  solutions  of  the  class  2  probability  can  also  be  compared 
by  expanding  each  in  a  Taylor  series  about  t  =  0.  If  the  two  are  the  same  to  first  order  in 
€  then  the  estimate  is  a  first  order  perturbation  solution.  If  they  differ,  this  would  suggest 
that  an  alternative  estimate  could  be  derived.  Expanding  ^^(n)  and  x|(n)  in  Taylor  series 
about  £=0: 

T5(n)  =  n€  -  -  n)€* +  0(e*)  (66) 

nl{n)  =  n£-^|n^-2n  ^Ai(e) 

To  first  order  in  €: 

x5(r»)  =  7r'(n)  =  nc +  0(£)  (68) 
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Figure  4;  Semi-Markov  transition  diagram  for  SCMS-II 


So,  the  approximation  developed  in  Theorem  4  produces  a  first  order  perturbation 
solution  in  <  for  this  model.  Therefore,  the  error  between  the  analytical  and  approximate 
class  2  probabilities  begins  with  the  order  €*  terms.  Note  that  the  dominant  sec  -nd  order 
term  (n*«*)  is  also  the  same.  It  can  Le  shown  [13]  that  the  error  is  due  to  a  difference  in 
a  second  order  term  with  a  small  coefficient,  namely  a  term  that  is  proportional  to  elapsed 
time.  Although  this  observation  is  strongly  model  dependent,  it  may  also  be  true  for  other 
models  as  well. 


5.3  The  SCMS  with  abbreviated  monitoring 

A  second  method  of  fault  monitoring  is  to  deploy  a  sequential  test  that  monitors  the  status 
of  a  component  until  a  failure  is  indicated,  at  which  point  the  sequential  test  is  discontinued. 
An  SCMS  of  this  type  will  be  denoted  by  SCMS-II. 

The  states  for  the  semi-Markov  model  of  the  SCMS-II  are  enumerated  in  Table  1.  The 
semi-Markov  transition  diagram  of  the  SCMS-II  is  depicted  in  Figure  4.  The  cl2iss  decom¬ 
position  of  the  SCMS-II  is  similar  to  SCMS-I.  However,  in  this  case,  the  embedded  Markov 
chain  in  class  1  is  non-ergodic. 


The  direct  form  of  the  core  matrix  sequence  can  be  developed  in  the  same  manner  as  for 


the  SCMS-I.  A  notable  difference  is  in  the  transition  probabilities  out  of  state  2.  Because 
the  fault  monitoring  test  is  discontinued  upon  a  failure  indication,  only  failure  events  cause 


such  transitions.  A  reset  of  state  2  occurs  when  no  failure  occurs.  A  transition  from  state  2 


to  state  3  occurs  only  if  a  failure  takes  place.  Assuming  geometrically  distributed  failures, 


the  core  matrix  can  be  stated; 


(l-0"/^(n)  r(l  -  rr‘Qo(n) 

G‘(n)  =  0  (l-£)^(n-l)  e5(n  — 1) 

0  0  S{n-1) 


As  for  the  SCMS-I,  JL(n)  can  be  found  using  ^-transforms.  The  state  probability  time 
histories  could  not  be  obtained,  however,  because  the  partial  fraction  expansions  could  only 
be  done  numerically.  These  results  are  decribed  fully  in  Appendix  B  of  [13].  However,  the 
class  probabilities  were  found  and  are  stated  below: 


x‘(n)  =  [(l-£)M-(l-£r] 


Again,  these  analytical  expressions  for  jr(n)  and  x*(n)  will  be  compared  to  the  approximate 
results  derived  using  the  approximate  technique  in  the  next  section. 

To  generate  the  approximate  solutions,  the  core  matrix  must  be  placed  in  standard  form. 
However,  all  of  the  required  quantities  are  known  based  on  the  manipulations  performed  for 
the  SCMS-I.  The  eventual  transition  probability  matrix  of  the  unperturbed  semi-Markov 
chain  is  obtained  by  setting  e  =  0  and  ignoring  the  holding  time  PMFs: 


I  -P/a  P/a  0 


1  0 


By  raising  this  matrix  to  successively  higher  powers,  the  stationary  interval  transition  prob¬ 
ability  matrix  can  be  found,  and  the  embedded  stationary  probability  distribution  in  par¬ 


titioned  form  is: 


—  [0  M  1] 


Because  of  the  model  structure,  it  is  clear  that  the  stationary  probabilities  for  each  class  of 
the  unperturbed  semi-Markov  chain  are:  x  t/m-  For  this  analysis,  the  time  scale  factor 


-."-V.’wV.' 


V.v.v --W 


m 


6  is  again  set  equal  to  e.  Finally,  721  =  ^nd  At  =  1,  so  that  the  approximate  expressions 
for  the  class  probabilities  can  be  found: 


7f'(t)=  exp(-|;)  ,  1 -exp(-^)  . 


By  expanding  the  enlarged  Markov  process  in  terms  of  the  stationary  probabilities  of  the 
unperturbed  semi-Markov  chain,  approximate  expressions  for  the  state  occupancy  proba¬ 
bilities  of  the  original  process  can  be  stated: 


7r(t)«  0  exp(-|;)  ,  1  -  exp(-|;) 


5.4  Discussion  of  results  for  SCMS-II 

The  approximate  state  probability  time  histories,  x;(n),  are  compared  to  those  obtained 
analytically,  2:(n),  in  Figure  5  for  each  of  the  three  states.  These  results  are  for  the  same 
parameter  set  as  SCMS-I.  The  largest  absolute  errors  occur  in  estimating  state  1  and  do 
not  attenuate  until  50%  of  an  MTBF  has  passed.  The  approximation  estimates  the  state  1 
probability  to  be  zero  because  the  class  1  embedded  Markov  chain  is  non-ergodic  and  yields 
zero  for  the  stationary  state  1  probability.  The  estimated  state  probabilities  in  states  2  and 
3  are  very  accurate  with  relative  errors  of  less  than  0.01%  for  all  time  steps. 

The  relative  error  in  state  1  is  100%  at  all  times  because  the  normalized  probabilities 
in  class  1  cannot  converge  to  the  stationary  probabilities  of  the  unperturbed  semi-Markov 
chain.  This  is  because  the  state  1  probability  will  never  be  exactly  zero.  For  example,  at 
the  tenth  time  step  in  class  1  the  normalized  state  probabilities  are 

xJJVo)  =  [0.981175  ,  0.014111],  (75) 

and  the  unperturbed  stationary  probabilities  are: 

=  [0  ,  1|.  (76) 

The  approximate  method  requires  that  the  normalized  probabilities  converge  to  the  sta¬ 
tionary  probabilities  for  each  class  in  order  to  obtain  accurate  state  probability  estimates. 

The  other  source  of  error  is  due  to  non-zero  e.  In  Figure  5,  the  value  of  £  was  small 
enough  to  provide  accurate  results  because  the  state  2  and  3  probabilities  were  estimated 
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Figure  6:  Sensitivity  to  t  for  SCMS-II.  The  relative  error  is  plotted  versus  the  single-step 
probability,  e,  for  mission  times  of  1  MTBF,  0.5  x  MTBF,  and  0.25  x  MTBF. 


adequately.  Figure  6  presents  the  class  2  (or  state  3)  occupancy  probability  for  mission 
times  of  100%,  50%  and  25%  of  an  MTBF  for  a  range  of  values  of  e  corresponding  to  a 
component  MTBF  ranging  from  4  seconds  to  5555  hours.  As  was  the  case  for  the  SCMS-I, 
the  relative  error  decreases  markedly  with  decreasing  e  for  the  three  choices  of  mission  time. 
This  reiterates  the  observation  that  the  fast  and  slow  time  scales  must  be  distinct  in  terms 
of  their  mean  holding  times  in  order  to  obtain  accurate  estimates  of  the  state  probabilities. 
This  analysis  also  demonstrates  the  usefulness  of  the  rule  of  thumb  suggested  earlier. 

The  Taylor  series  expansions  for  the  analytical  and  approximate  cIms  2  probability  will 
again  be  compared.  Expanding  the  class  2  octupancy  probability  in  a  Taylor  series  about 
£  =  0  yields 


5r|(n)  =  ne  -  -  n)£* -|-0(e*) 


Xjln)  =  n€  -  -I- 0(£*) 

it 


(77) 

(78) 


To  first  order,  x^n)  and  ^^(n)  are  identical.  This  proves  that  the  approximate  method 
produces  a  first  order  perturbation  solution  in  e  for  this  model.  The  two  expressions  begin 
to  differ  starting  with  the  £*  terms,  but  the  dominant  second  order  term  (n*e^)  is  the  same. 
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Hence,  the  error  can  be  expressed  as: 


i 
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K 

f! 


h' 


+  0{e^)  (79) 

which  is  second  order  in  e  and  proportional  to  time,  which  emphasizes  the  asymptotic 
nature  of  the  approximation.  Again,  this  observation  is  model  dependent.  However,  the 
same  behavior  was  found  for  the  SCMS-I. 

6  The  SCDR  System  Model 

The  single-component  dual-redundant  (SCDR)  system  consists  of  two  identical  components, 
a  primary  and  a  backup,  operating  in  parallel.  An  independent  sequential  test  monitors 
the  status  of  each  component.  The  reliability  of  this  system  was  evaluated  using  the  ap¬ 
proximate  technique  in  [13].  However,  in  the  interest  of  brevity  and  clarity,  the  interested 
reader  is  referred  to  [13] . 

7  Conclusions 

A  primary  contribution  of  this  work  is  the  extension  of  Korolyuk’s  limit  theorem  for  semi- 
Markov  processes  to  semi-  Markov  chains  in  Theorem  1,  which  describes  the  conditions 
under  which  a  perturbed  semi-Markov  chain  can  be  approximated  by  an  enlarged  Markov 
process.  Moreover,  Theorem  1  describes  how  the  parameters  of  the  enlarged  Markov  process 
are  derived  from  the  parameters  of  the  semi-Markov  chain. 

Two  problems  arise  in  applying  Theorem  1  to  fault  tolerant  control  system  (FTCS) 
models.  First,  the  non-perturbed  embedded  Markov  chains  in  each  class  are  usually  non- 
ergodic.  This  was  required  in  Theorem  1,  but  was  relaxed  to  the  existence  of  the  Caesaro 
limit  probabilities  in  Lemma  2.  These  were  found  to  exist  in  Lemma  3  if  the  embedded 
Markov  chain  was  either  ergodic,  or  non-ergodic  with  one  and  only  one  unity  eigenvalue. 

Second,  the  transition  PMFs  are  typically  not  functions  of  the  perturbation  parameter  e. 
This  problem  was  mitigated  by  introducing  the  concept  of  time  scaling  in  Theorem  4.  The 
form  of  the  transition  PMFs  was  generalized  to  include  those  common  to  FTCS  reliability 
models.  This  generalization  included  a  dependence  on  a  time  scaling  factor  5  and  on  a 


small  parameter  €  that  determined  the  state  space  partitioning  of  the  original  semi-Markov 
chain. 

Use  of  the  approximate  technique  was  demonstrated  by  two  simple  examples.  Accu¬ 
rate  estimates  of  the  state  probabilities  were  determined  for  situations  where  e  was  "small 
enough”  and  where  the  normalized  probabilities  in  each  class  had  converged  to  the  station¬ 
ary  probabilities  of  the  non-perturbed  semi-Markov  chain.  In  the  two  examples  presented, 
the  approximate  technique  yielded  a  first  order  perturbation  solution  in  e  to  the  analytically 
obtained  class  probabilities. 

The  approximation  error  was  found  to  be  insignificant  if  the  slow  and  fast  time  scales 
were  distinct.  Finally,  a  rule  of  thumb  was  suggested  by  the  error  analysis:  the  slow  and 
fast  time  scales  are  distinct  if  the  MTBF  of  the  fastest  failure  is  1000  times  longer  than  the 
mean  decision  time  of  the  slowest  FDI  event. 

8  Acknowledgments 

This  research  was  wholly  supported  by  the  U.S.  Air  Force  Office  of  Scientific  Research  under 
grant  AFOSR-84-0160. 

References 

[1]  B.  Walker,  N.  Wereley,  R.  Luppold,  and  E.  Gai,  “Effects  of  redundancy  management 
on  reliability  modeling,”  1988.  Submitted. 

[2]  E.  Gai,  J.  Harrison,  and  R.  Luppold,  “Reliability  analysis  of  a  dual  redundant  engine 
controller,”  in  Proceedings  of  the  SAE  Aerospace  Congress  and  Exposition,  1982. 

[3]  B.  Walker,  A  Semi-Markov  Approach  to  Quantifying  Fault  Tolerant  System  Perfor¬ 
mance.  PhD  thesis,  Massachusetts  Institute  of  Technology,  Dept,  of  Aeronautics  and 
Astronautics,  1980. 

[4]  R.  Howard,  Dynamic  Probalistic  Systems,  Volume  2:  Semi-Markov  and  Decision  Pro¬ 
cesses.  Wiley  and  Sons,  1971. 


[5]  E.  Lewis  and  F.  Bohm,  “Monte  Carlo  simulation  of  Markov  unreliability  models,” 
Nuclear  Engineering  and  Design,  vol.  77,  no.  1,  pp.  49-62,  1984. 

[6]  T.  Zhuguo  and  E.  Lewis,  “Component  dependency  models  in  Markov  Monte  Carlo 
simulation,”  Reliability  Engineering,  vol.  13,  no.  1,  pp.  49-62,  1985. 

[7]  K.  Trivedi  and  J.  Dugan,  “Hybrid  reliability  modeling  of  fault-tolerant  computer  sys¬ 
tems,”  Computer  and  Electrical  Engineering,  vol.  11,  no.  2/3,  pp.  87-108,  1984. 

[8]  K.  TVivedi,  J.  Dugan,  R.  Geist,  and  M.  Smotherman,  “Modeling  imperfect  coverage  in 
fault  tolerant  systems,”  in  Proceedings  of  the  Fourteenth  International  Conference  of 
Fault- Tolerant  Computing,  pp.  77-  82,  1984. 

[9]  V.  Korolyuk,  L.  Polishchuk,  and  A.  Tomusyak,  “A  limit  theorem  for  semi-Markov 
processes,”  Cybernetics,  vol.  5,  no.  4,  pp.  524-526,  1969. 

[10]  G.  Korn  and  T.  Korn,  Mathematical  Handbook  for  Scientists  and  Engineers.  McGraw- 
Hill,  1968. 

[11]  J.  Kemeny  and  J.  Snell,  Finite  Markov  Chains.  Springer- Verlag,  1976. 

[12]  S.  Chu,  Approximate  behavior  of  generalized  Markovian  models  of  fault  tolerant  sys¬ 
tems.  Master’s  thesis,  MMsachusetts  Institute  of  Technology,  Dept,  of  Aeronautics 
and  Astronautics,  1986. 

[13]  N.  Wereley,  An  approximate  method  for  evaluating  generalized  Markov  chain  reliability 
models  of  fault  tolerant  systems.  Master’s  thesis,  Massachusetts  Institute  of  Technology, 
Dept,  of  Aeronautics  and  Astronautics,  1987. 


